reinforcement learning Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine ...

(RL), a model-free algorithm (as opposed to a model-based one) is an algorithm which does not use the ''transition probability distribution'' (and the ''reward function'') associated with the Markov decision process (MDP), which, in RL, represents the problem to be solved. The transition probability distribution (or transition model) and the reward function are often collectively called the "model" of the environment (or MDP), hence the name "model-free". A model-free RL algorithm can be thought of as an "explicit"

trial-and-error Trial and error is a fundamental method of problem-solving characterized by repeated, varied attempts which are continued until success, or until the practicer stops trying. According to W.H. Thorpe, the term was devised by C. Lloyd Morgan (1 ...

algorithm. An example of a model-free algorithm is

Q-learning ''Q''-learning is a model-free reinforcement learning algorithm to learn the value of an action in a particular state. It does not require a model of the environment (hence "model-free"), and it can handle problems with stochastic transitions an ...

Key 'Model-Free' reinforcement learning algorithms

{, class="wikitable sortable" style="font-size: 96%;" !Algorithm , , class=unsortable, Description , , class=unsortable, Model , , Policy , , class=unsortable , Action Space , , class=unsortable , State Space , , Operator , - ! scope="row" , DQN , Deep Q Network , , Model-Free , , Off-policy , , Discrete , , Continuous , , Q-value , - ! scope="row" , DDPG , Deep Deterministic Policy Gradient , , Model-Free , , Off-policy , , Continuous , , Continuous , , Q-value , - ! scope="row" , A3C , Asynchronous Advantage Actor-Critic Algorithm , , Model-Free , , On-policy , , Continuous , , Continuous , , Advantage , - ! scope="row" , TRPO , Trust Region Policy Optimization , , Model-Free , , On-policy , , Continuous , , Continuous , , Advantage , - ! scope="row" , PPO , Proximal Policy Optimization , , Model-Free , , On-policy , , Continuous or Discrete , , Continuous , , Advantage , - ! scope="row" , TD3 , Twin Delayed Deep Deterministic Policy Gradient , , Model-Free , , Off-policy , , Continuous , , Continuous , , Q-value , - ! scope="row" , SAC , Soft Actor-Critic , , Model-Free , , Off-policy , , Continuous , , Continuous , , Advantage

References

Reinforcement learning